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SMALL-WORLD GRAPHS 

By Oskar Sandberg 

Chalmers University of Technology and Goteborg University 

Small-world graphs, which combine randomized and structured 
elements, are seen as prevalent in nature. Jon Kleinberg showed that 
in some graphs of this type it is possible to route, or navigate, between 
vertices in few steps even with very little knowledge of the graph itself. 

In an attempt to understand how such graphs arise we introduce 
a different criterion for graphs to be navigable in this sense, relat- 
ing the neighbor selection of a vertex to the hitting probability of 
routed walks. In several models starting from both discrete and con- 
tinuous settings, this can be shown to lead to graphs with the desired 
properties. It also leads directly to an evolutionary model for the cre- 
ation of similar graphs by the stepwise rewiring of the edges, and we 
conjecture, supported by simulations, that these too are navigable. 



1. Introduction. 

1.1. Shortcut graphs. Starting with the small- world model of Watts and 
Strogatz [22], rewired graphs have been the subject of much interest. Such 
graphs are constructed by taking a fixed graph, and randomly rewiring some 
portion of the edges. Later models of partially random graphs have been 
created by taking a fixed base graph, and adding "long-range" edges between 
randomly selected vertices (see [19, 20]). The "small- world phenomenon," in 
this context, is that graphs with a high diameter (such as a simple lattice) 
attain a very low diameter with the addition of relatively few random edges. 

Jon Kleinberg [11] studied such graphs, primarily ones starting from a 
two-dimensional lattice, from an algorithmic perspective. He allowed for 
0{n) long-range edges and found that, not only would this lead to a small 
diameter, but also that if the probability of two vertices having a long-range 
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edge between them had the correct relation to the distance between them in 
the grid, the greedy routing path-length between vertices was small as well. 
Greedy routing means, as the name implies, starting from one vertex and 
searching for another by always stepping to the neighbor that is closest to the 
destination. That the base graph is connected means that a nonoverlapping 
greedy path always exists, so the question regards the utility of the long- 
range contacts in shortening this path. Graphs where one can quickly route 
between two points using only local information at each step, as with greedy 
routing, are referred to as navigable. 

Initially, we will stay in the one-dimensional translation-invariant envi- 
ronment (i.e., with the vertices arranged on a circle). Later sections extend 
some of the results to other classes of graphs. In general, we will call graphs 
of the type discussed shortcut graphs and use the shorter term shortcut for 
the long-range edges. 

1.2. Contribution. While Kleinberg's results are important and have been 
a catalyst for much study, it is not fully understood how the rather arbitrary 
and precise threshold on the shortcut distribution might arise in practice. 
In this work, we present an alternative distributional requirement that asso- 
ciates the shortcut distribution with the hitting probabilities of walks under 
greedy routing. We study this in the canonical case of a single loop, and 
in a wider setting of graphs induced by the Voronoi tessellations of a Pois- 
son process. We show that distributions that meet a certain criterion which 
we call being "balanced" have 0(log^ n) mean routing times, similar to the 
critical case in Kleinberg's model. 

The relationship in this criterion naturally leads to a stepwise rewiring 
algorithm for shortcut graphs. The Markov chain on the set of possible 
shortcut configurations defined by this algorithm can easily be seen to have 
a stationary distribution with balanced marginals. Our analytic results can- 
not be applied directly in this case, because the stationary distribution has 
dependencies between the shortcuts at nearby vertices. However, we argue 
through heuristics and simulation that these dependencies in fact work in 
our favor, and that graphs generated by our algorithm can be efficiently 
navigated. 

1.3. Previous work. The roots of the recent work on navigable graphs 
are the papers by Jon Kleinberg [10, 11]. Further exposition is given in 
[1, 16, 17]. Continuum models similar to the ones discussed below have been 
introduced in [5, 9], and, in a more practical context, [15]. 

A very different algorithm that appears to produce navigable graphs has 
been independently proposed in [4], where it is tested by simulation. In [7] 
the emergence of navigable graphs is discussed in terms of a method for 
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small-world construction without requiring an understanding of the geogra- 
phy, but the method developed is complicated and unnatural. An algorithm 
similar to that proposed below is present in Freenet [2, 3, 23] — the work 
below was in part inspired by attempts to place Freenet's algorithms in 
environments more conducive to analysis. 

A recent survey of the field by Kleinberg is [13]. In the final section, he 
identifies the question of how small-world graphs arise as one of the central 
questions in the field. 

2. Preliminaries. 

2.1. Decentralized routing. The central problem in this area of research 
is that of routing through a graph with only limited knowledge of the graph 
itself. That is, given two vertices x and y in a (di)graph G, we want to find a 
path connecting x and y. In general, the combinatorial problems of finding 
such a path, and finding the shortest such path, are well-understood prob- 
lems involving 0(n) and 0(n^) steps, respectively. The question becomes 
more interesting if we allow some (but not all) of the information about 
the graph to be known when determining the path. In particular, we know 
the distances between vertices as given by a function d{x,y). With such a 
distance function, one may define a decentralized algorithm (following Klein- 
berg [11]) as an algorithm which, in each step, uses only information about 
the distances between vertices already seen in the route and the destination 
to decide where to go next. 

Definition 2.1. A decentralized algorithm for finding a path from a 
point y to z in a graph G associated with a distance function d:V x V t-^U. 
is defined as follows: 

• Let the 5o = {y}- 

• In step k, the algorithm chooses exactly one point in N{Sk~i) (the set 
of all neighbors in G of points in Sk-i) and appends this point to create 
Sk- The choice of x is a possibly random function of the subgraph of G 
induced by S^-i, as well as the distance of all the vertices in N{Sk-i) to 
each other and to z as given by d. In particular, it may not depend on 
the rest of G. 

• The algorithm terminates in step k if z € Sfc . 

The definition is inspired by the small-world experiments [18] where peo- 
ple were enlisted to forward a letter to a stranger through friend-to-friend 
links. The people in the experiment knew something about the final recip- 
ient (typically where he lived and his occupation), so they could compare 
how "close" each acquaintance they considered sending the letter to was to 
the target, but they had no global knowledge of the social network itself. 
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For a decentralized algorithm to be able to perform better than a ran- 
dom walk, it is necessary that d{x,y) contains some information about the 
structm'e of the graph. The extreme of this is where d{x,y) is the graph 
distance implied by G, the minimal distance from x to y in G, which we 
denote dcix^y). In this case routing is trivial: proceeding in each step to 
the neighbor which is closest to z will always find a minimal path. A more 
typical situation is that d{x,y) gives some, but not complete, information 
regarding where to go. In particular, we shall say that d{x,y) is adapted for 
routing in a graph G, if for any z and x €V, x has a neighbor y such that 
d{y,z) < d{x,z). When such a distance measure exists, we can route to any 
point by always choosing such a y as the next step, though the path thus 
found may be far from optimal. 

The common situation is to let H hea fixed graph, and let G be created by 
randomly augmenting H with further edges in order to create a semirandom 
graph. It is then trivially true that dH{x,y) is adapted for routing in G. 
The random edges need not be uniformly distributed, and indeed all the 
interesting cases arise when the probability of an edge being added between x 
and y depends on dnix^y). Some independence is usually assumed, however, 
so that the edges previously seen in a route are independent of those in the 
future. We let i{x,y) denote the probability of adding an edge from x to y. 

Given such a random augmentation of edges, the question arises whether 
a decentralized algorithm can be found which efficiently routes through a 
family of graphs. In particular, for a family for finite graphs of bounded 
degree that are indexed by size, is there a decentralized algorithm such that 
the expected number of steps of a route between two points is asymptotically 
small (by which we typically mean that it grows at most poly-logarithmically 
with the size). 

In Kleinberg's original work [11], the underlying graph was (the family 
of finite two-dimensional grids) with edges between adjacent vertices, mak- 
ing the d(x,y) the metric (Manhattan distance). He proved that poly- 
logarithmic routing was possible if i{x,z) = l/{h 

rijQ*^^) with (y — 2 (^h'fi Q^ is 
the distribution's normalizer), but impossible for all other values of a. Klein- 
berg's results also cover the same situation in Z.^^, in which case the single 
good value of a is exactly d. Similar analysis has been done by others; see, 
for example, Barriere et al. [1] for thorough analysis of the directed loop, 
and Duchon et al. [6] for a wider class of graph families. In all these cases 
(as well as in [12, 14, 15, 21]) it is found that efficient routing is possible 
when 

(1) ^ UR L/ ^v^ 

Yo\{B^{d{x,y))) 

where Bx{r) = {z : d{x, z) < r}, or some slight variation thereof. [We will use 
this notation for the ball, as well as Sx{r) for its boundary throughout the 
paper.] 
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Similarly, it turns out that in all these cases, the decentralized algorithm 
necessary is simply greedy routing, which means choosing in each step the 
unexplored neighbor of the previously explored vertices which is closest to 
the destination. When d{x,y) is adapted for routing, greedy routing strictly 
approaches the target with each step and is always successful. The nature 
of the greedy paths through augmented graphs is the main emphasis of this 
paper. 

The following is a very coarse, obvious, upper bound on the routing time: 

Observation 2.2. If a distance function d: F x 1/ i-^ M is adapted for 
routing in a graph in G = (V, E) then greedy routing from x to z takes a 
number of steps which is at most the cardinality oi {v & V : d{v, z) < d{x, z)}. 

2.2. Distribution and hitting probability. Consider an underlying graph 
H = (y, E) , which may be directed but must be connected in the sense that 
it contains a direction-respecting path from any vertex to any other. Let 
d{x,y) be the distance function implied by H, and let a random graph G 
be constructed by augmenting H with one random directed edge starting 
at each vertex. The edges added by the augmentation will be denoted as 
7 : y I— > y. We call 7 a shortcut configuration, and let T = V >-^V he the set 
of all such functions. The general probability space over which we will work 
isT xV xV, where the two copies of V represent the possible starts and des- 
tinations of walks. Let P be a probability measure on this set where the start 
and destination are chosen uniformly and independently of each other and 
the configuration is chosen by some shortcut distribution ^(7) which in the 
independent selection case may be factored into the form JJ^ev ^i^'^i^))- 

On this space, we define X^{t) for i = 0, 1, . . . , as a greedy walk in the 
graph from a uniformly chosen starting point Y = X^{Q) with a uniformly 
chosen destination Z. To make the greedy walk well defined, we dictate that 
ties are broken randomly (i.e., if the m closest neighbors to the destination 
are equally far from it, one is selected uniformly at random). Below, we will 
in particular be interested in the hitting probability of greedy walks with 
specific destinations. We define this formally as 

(2) /i(x,z) =P(Xj(t) =x for some t\Z = z). 

If if is a translation-invariant graph, then h[x, z) = h[x — z, 0) for some dis- 
tinguished vertex 0. Thus we will, without further loss of generality, consider 
the hitting probability as a function of one variable and write h{x) = h{x, 0). 
Further, we will restrict our analysis to cases where i{x,y) and h{x,y) are 
functions of d{x,y) only (we call this distance invariance). 

Our results concern relating h{x) to the occurrence of shortcuts between 
vertices. Immediately, however, we can see that h{x) gives us the expected 
length of a greedy path. Since such a path can hit each point only once, it 
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follows that if T is the length of a greedy path from a random point to zero, 
then 

^ = 51 ^{X^ {t)=x for some t} 

whence it follows that 

(3) E[r] = Hx). 

x&V 

We will denote the expected greedy walk length r = E[T]. 

3. Rewiring by destination sampling. Before proceeding to analyze our 
main model, we present the rewiring algorithm which motivates it. Running 
the algorithm modifies, in each step, the destinations of some shortcut edges. 
It is a steady-state algorithm in the sense that the number of edges never 
changes: it simply shifts the destinations of the single existing shortcut at 
each vertex. 

In the sense that we propose a generative process which might explain 
why navigable graphs arise, this is similar to the celebrated preferential 
attachment model for power law graphs of Barabasi and Albert. However, 
it is not a growth model for the graph since the number of vertices and 
edges never changes, and is thus more similar to the variant of preferential 
attachment discussed in [8]. 

The proposed algorithm, which we call destination sampling, is as follows: 

Algorithm 3.1. For a given graph H = (F, i?), let 7^ be a shortcut 
configuration at time s. From each vertex there is exactly one shortcut. Let 
< p < 1. Then 7^+1 is defined as follows: 

1. Choose ys+i and Zg+i uniformly from V . 

2. If ys+i 7^ -Zs+i, do a greedy walk from to Zg using H and the shortcuts 
of 7s. Let xq = xi, a;2, . . . ,xt = Zs+i denote the points of this walk. 

3. For each xq, xi, . . . , xt-i independently with probability p replace its cur- 
rent shortcut with one to Zg+i [i.e., let 7s+i(xj) = Zg+i]- 

After a walk is made, 7<j+i is the same as 7^, except that the shortcut 
from each vertex in walk s + 1 is with probability p replaced by an edge 
to the destination. In this way, the destination of each edge is a sample 
of the destinations of previous walks passing through it (for a realization, 
see Figure 1). We strongly believe that updating the shortcuts using this 
algorithm eventually results in a shortcut graph with greedy path-lengths 
of O(log^n). Though one can relate the stationary regime of this algorithm 
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to the balanced distributions (see below), a rigorous bound has not been 
proved. 

The value of p is a parameter in the algorithm. It serves to disassociate 
the shortcut of a vertex with those of its neighbors. For this purpose, the 
lower the value of p > the better, but very small values of p will also lead 
to slower sampling. 

3.1. Markov chain view. Each application of Algorithm 3.1 defines the 
transition of a Markov chain on the set of shortcut configurations, T. The 
Markov chain in question is defined on a finite (if large) state space. If it is 
irreducible and aperiodic, it thus converges to a unique stationary distribu- 
tion. 

Proposition 3.2. The Markov chain (7s)s>o is irreducible and aperi- 
odic. 

Proof. Aperiodic: There is a positive probability that = Zs in which 
case nothing happens at step s. 

Irreducible: We need to show that there is a positive probability of going 
from any shortcut configuration to any other in some finite number of steps. 
This follows directly if there is a positive probability that we can "re-point" 
the shortcut starting at a vertex y to point at a given target z without 



Fig. 1. A shortcut graph generated by our algorithm (n = 100). 
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changing the rest of the graph. But the probabihty of this happening in a 
single iteration is at least 

P(y = y, Z = z, and only y rewired) > ——pil — p)"'~'^ > 0. n 

n n 

Thus there does exist a unique stationary shortcut distribution, which 
assigns some positive probability to every configuration. The goal is to mo- 
tivate that this distribution leads to short greedy walks. 

Proposition 3.3. Under the unique stationary distribution of the Markov 
chain (7s)s>o 

N h{x,z) 
z{x, z) — 



Proof. As selected by the algorithm, the shortcut from a vertex x at 
any time is simply a sample of the destination of the previous walks that 
X has seen. Under the stationary distribution this should not change with 
time, so 

£{x, z) = F{Z = z\Xl{t) = X for some t). 

Using Bayes' theorem, this can be seen as a statement relating £ to the 
hitting probability, that is, 

i{x, z) = P{Z = z\Xz{t) = X for some t) 

P(Xj(t) = X for some t|Z = 2)P(Z = z) 
~ Y.i^x^{Xl{t)=x ioT some t\Z = C)Y{Z = i)' 

The first multiple in the numerator is the hitting probability h{x,z). The 
formula then follows from the uniform distribution of Z and translation 
invar iance. □ 

4. Balanced shortcut distributions. We use Proposition 3.3 as the start- 
ing point of our analysis, defining the class of all distributions having the 
same marginal property as follows. 

Definition 4.1. If a graph H with distance function d{x, y) is randomly 
augmented such that 

h{x, z) 



(4) £(x,z) 



where h is given by (2), then the joint distribution of shortcuts is said to be 
balanced. 
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We will show for several classes of graphs that this relationship leads to 
navigability, allowing for a characterization other than that of (1). Besides 
the relationship with Algorithm 3.1, balance is in some ways a natural re- 
quirement. The left-hand side describes the distribution of the destinations 
of walks that hit the point x, so our results simply say that a good choice 
of shortcuts is one that matches this. 

Theorem 4.2. For a translation-invariant graph H , there exists a bal- 
anced distribution which selects shortcuts independently at each vertex. 

Proof. Like before, we let i{x,y) be the marginal probability that x 
has a shortcut to y. The joint distribution is simply the product over all 
vertices. 

For a single walk toward a given z, we may view Xj (t) as a Markov 
chain on the set of vertices, with some transition kernel Pz{y,x). As above, 
we will set z = and drop the index in the calculations below without loss 
of generality. The process hits every point except 2; = at most once, and 
we can let this point be absorbing. The transition kernel P then consists of 
two mechanisms: either we step to x which is closer to than y because it 
is the destination of the shortcut from y, or we step to one of y's neighbors 
in H because y's shortcut leads to somewhere from which it is further to 
than y. 

Let N{x) be the set of neighbors of x in H, and let L{x) = G y : 0) > 
(i(x,0),^ 7^ x} be the set of vertices at least as far as x from 0. Also, let 
P{x) = G iV(x):(i(^,0) > d{x,0),{^,x) edge in H} (the set of "parent" 
vertices that can greedy route to x in H) and C{x) = G N{x) : d{^,0) = 
(i(^,0) > d{x,0),{x,^) edge in H} (the set of "child" vertices that x can 
route to). Then the transition kernel of the process described is 



We can thus express the hitting probability for any x 7^ for a greedy 
walk as 



for y /O. P{0,x) =X{x=o}- 




h{x)= Kmi,x) + — 



(5) 



1 



+ 



n — 1 
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The first two terms in (5) represent the probability that we enter x through 
either a shortcut or from a parent vertex, respectively, and the last term is 
the probability that the walk starts at x. 

Note that, for any x, (5) gives a recursive definition of h{x) in terms of 
the distribution I. Fix such a distribution I' . From this we can thus calculate 
the hitting probabilities h'{x), and define 

The mapping i' i— > £" is continuous since J2xeV h'{x) > 1 and maps the sim- 
plex of (n — l)-valued distributions into itself. Since the simplex is convex, 
Brouwer's fix-point theorem gives the existence of at least one fix-point i* , 
which is a balanced distribution. □ 



d{x,y) 



5. The directed cycle. We let H be the directed cycle on n vertices, 
which will be numbered through n — 1 such that the edges are directed 
downward (modulo n). The implied distance function (which is not sym- 
metric) is 

x-y, a y<x, 

n — y + X, otherwise. 

This environment is perhaps the most natural one for greedy routing, and 
has previously been the subject of a thorough analysis by [1]. There exists 
exactly one point at each distance from 0, and greedy routing means selecting 
the shortcut if and only if its destination lies between and the current 
position. Equation (5) here simplifies to 

n—1 n—1 -. 

hix)= hi()e{^-x) + h{x + l) m + 7- 

To prove our result in this environment, we will need the following lemma: 



Lemma 5.1. If the shortcut configuration is chosen according to a distance- 
invariant joint distribution, then h{x) is nonincreasing in x. 

Proof. Let I CT x V he the event consisting of all configurations and 
starting points such that a greedy walk for hits the point x + 1. Now we 
shift all the coordinates of this set down by one (modulo n), and call the 
translated set J. By the definition and distance invariance 

/i(x + l) = P(/)=P(J). 

However, every element in J corresponds to a starting point and shortcut 
configuration for which the greedy walk hits x. To see this, we pick a starting 
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point y and configuration 7, such that (7,1/) € /. This means that there is 
an integer m and a path xo,xi, . . . ,Xm such that xq = y, Xm = x + 1 and 
either 

n — 1 > "^{xi) > Xi and Xj+i = — 1 

or 

aJj > ^{xi) > X + 1 and Xj+i = 7(xj) 

for all z = 0, 1, . . . ,m. The corresponding configuration in J has a similar 
path Xq, . . . , x'^ {x'j^ = Xi — 1) where Xg = y — 1, x^ = x and either 

n — 2 > 7(x^) > x^ and x-_,_]^ = x^ — 1 

or 

a;- > 7(a;i) > a; and x-+i=7(xi)' 

for alH = 0, 1, . . . , m. This means that starting in y — 1 will cause the greedy 
walk to hit x. [Note that not every configuration and starting point that 
cause greedy walks to hit x are necessarily in J, since 7(x9 must be less 
than n — 2 rather than n — 1 in the first line.] 
It now follows directly that 

P(J) < h{x). □ 

We can now show that greedy routing here has taken a similar number of 
steps to the critical case in Kleinberg's model. 

Theorem 5.2. For every n = 2^ with k > 3, the shortcut graph with 
shortcuts selected independently according to a balanced distribution has an 
expected greedy routing time 

r<2A;2. 

The proof method is similar to that introduced by Kleinberg for augmen- 
tations described by (1) links, but the implicit definition of the shortcut 
distribution requires a somewhat more involved approach. 

Proof of Theorem 5.2. Assume that r > 2k'^. We will show that for 
k sufficiently large this always leads to a contradiction. 

To start with, divide {1, 2, . . . , n — 1} into at most k disjoint phases. Each 
phase is a connected set of points, each successively further from the desti- 
nation 0, and they are selected so that a greedy walk is expected to spend 
the same number of steps in each phase. Thus, the first phase is the interval 
Fi = {1, . . . , ri} where ri is the smallest number such that 

m) = ^ m > i/k. 
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i-3 



i-2 



i-1 



Fig. 2. Illustration of the proof of Theorem 5.2. If a phase covers less than half of the 
"remaining ground," then any shortcut of the same distance from as the is from the 
points in the phase takes us out of the phase. 



The second phase is defined similarly as the shortest interval {ri + 1, . . . , 
such that i{F2) >l/k. Let m be the total number of such intervals which 
can be formed, and let Fa denote the remaining interval {rm + 1, . . . , n — 1} 
which could be empty. By construction i{Fji) <l/k and the total number 
of phases, including Fr, is at most k. 

Before proceeding, we need to bound by how much i of the different 
phases can deviate from one another since this will also tell us by how much 
the expected number of steps in each phase can differ. From (4) and the 
assumed lower bound of r, it follows that 



„, , h(x) 1 



for all X. This implies that 



for all i € {1, . . . , m}, and thus 

(6) £{Fi)<(^l + ^y{F,) 

for all i,j € {1, . . . , m}. It also gives m > k'^ / {k + 1) — 1. 

Consider now F^ = {r^^i + 1, r^^i + 2, . . .,rm.} and let L = {0, 1, . . .,rm.-i}- 
Our goal is to show that, from any point in Fm, there is a considerable 
probability of having a shortcut to L. We know that l^n. Assume that 
"Tm-i > Fm then covers less than half of the distance from Vm to the 

target. In particular 

{^rn Ffn} CI L. 

Thus, if rm has a shortcut with destination in {rm — Fm}, any walk which 
hits r-m will leave Fm in the next step (see Figure 2). We thus know that 
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L) > e{rm, {rm - i^m}) = i{F,n) > l/k. 

Lemma 5.1 tells us that the probability of having a shortcut to L cannot 
decrease for points less than r^, so for each vertex that the walk hits within 
Fm , there is an independent probability of at least 1/ A; of leaving Fm in the 
next step. This means that the expected number of steps the walk can take 
in Fm is at most k. 

The expected number of steps in a phase is h{Fi) = Ti{Fi) so, by (6), it 
then holds that 

(7) h{F,)<{l + l/2k)h{Fm)<k + l/2 

for all i G {1, . . . ,m} and also for Fr. There are at most k phases, so this 
implies that t < k'^ + k/2, which contradicts our assumption for all k>2. 

Thus the original assumption implies that r^-i ^rm/2 <n/2. But by an 
identical argument for we can show that rm-2 < ''m-i/2. It follows 

by iteration that 

1 

Tj < rn 

' — 2m— I 

and in particular 

n < -^n < 2('=+2)/('^+i) < 4. 

^ — 2m— 1 — — 

This means that Fi contains at most four points, so h{Fi) < 4 and thus, 
again by (6), h{Fi) < 5 for all i. For A; > 3 this contradicts the original 
assumption. This completes the proof. □ 

Theorem 5.2 gives us an alternative distributional criterion for attain- 
ing O(log^n) expected greedy path-lengths. Since Kleinberg showed that 
this cannot hold for many distributions, the balanced distributions must be 
"close" to the critical, harmonic decay. More specifically, drawing on the 
proofs that navigability is not possible for most case graphs, we can see that 
there cannot exist 6 >0,e>0 and iV G N such that 2,..., n^}) < n'" 
for the cycles of size n> N. This would be the case if the tails of the distri- 
butions dominated a power law decay with exponent a < 1. Similarly, 
there cannot exist (possibly different) 5 > 0, e > and N gN such that 
£{{n^ , + 1, . . . ,n — 1}) < n~'' for the cycles of size n > A^, as would be the 
case if the tails were dominated by a power law with exponent a > 1. 
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6. Delaunay graphs. The smaU-world theory is not necessarily hmited to 
situations where vertices are placed in a fixed grid. In this section, we will let 
the vertices be points of a spatial Poisson process, and the distance function 
be the Euclidean metric. For simplicity, we will relax our requirements a 
little and let the graphs have degrees bounded in expectation, rather than 
uniformly. 

Let S'^ be the d-dimensional surface of a + 1 ball with radius such that 
the volume/area of S"^ is 1. We let V = {xi} be the points of a homoge- 
neous Poisson process with intensity A = n'^ in this space. Prom this Poisson 
process we may construct the Voronoi tessellation, that is, the collection of 
cells C{xi) where 



C{xi) is that part of the space which is as close to Xj as to any other point. 
The Voronoi cells are closed convex polyhedrons that border other cells along 
each side, thus overlapping on sets of Lebesgue measure zero. 

The tessellation induces a graph G with vertices V (known as the Delau- 
nay graph) as follows. Let G = iy,E), where {x,y) € -E if and only if C{x) 
and G{y) intersect in an infinite number of points (this is a.s. equivalent to 
intersecting in at least one point). Intuitively, this is the graph that connects 
a vertex x to all its neighbors in the tessellation. This Delaunay graph is a 
natural base graph for greedy routing among the points. 

Lemma 6.1. Let {xi} he any point-set in S'^, and G its Delaunay graph. 
Then the Euclidean metric d{x,y) = \x — y\ is adapted for routing in G. 

Proof. We must prove that for al\ x ^ z £V , there exists y €V (which 
may be z) such that (x,y) G E and \x — z\ > \y — z\. Consider the line xz. 
Let w be the first point we encounter as we move from x along xz, satisfying 
w G C{y) for some y / x {w is well defined since the cells are compact). 

It is clear that w (z G{y) for some y such that x and y are connected in 
the Delaunay graph [C(x) must border at least one cell that it meets at w]. 
Clearly, \y — w\ = \y — w\ since w is in both cells. Thus 



where the strict inequality follows from the fact that w is not on the line 
yz. □ 

Given this graph, we consider augmentations that allow for fast routing. 
A direct approach would be to connect a given vertex to any other with 




y — z\ < \y — w\ + \w — z 



= \x — w\ + \ w — z\ = \x — z 
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a probability depending on the distance between them, but this leads to 
complications regarding dependencies between the progress made at each 
step (though not insurmountable ones; see [5] for such an approach in a 
similar environment). 

Instead, we augment the graph as follows. For each vertex x €V, let 
{ni{x)}^i^ be the points of a nonhomogeneous Poisson process given by 
the measure fix{A) = ixiA\C{x)) for some shortcut measure i on the Borel 
sets of S'^, and 4(^) = iiA - x). 

We then augment G by adding a directed edge from x to y if nj(x) G C{y) 
for any i = 1, . . . ,N . 

Lemma 6.2. Ifx,z&V and \z — ni{x) \ < \z — x\/4: for some i = 1,2, . . . , N , 
then X has a shortcut y & V (which may be z) such that \z — y\ <\z — x\/2. 

Proof. Let w be such an ni{x). With probability 1 it is in exactly one 
cell C{y). If y = z, then x has a shortcut to z; otherwise \w — y\ <\w — z\. 
In the latter case, 

1-2 — yj < l-Z — 'IL'I + I'U^ — -2| < 2\w — z\ <\x — z\/2. □ 

6.1. Kleinberg augmentation. To motivate the model, we first show that 
augmentation along the lines of Kleinberg's model allows for an 0(log^(n)) 
bound on the routing time. That is, as in (1), we let the augmentation be 
given by the measure 

f dr 
^^^^=/4lognVol(r) 

where Vol(r) is the volume of a ball of radius r in 5^. The measure is defined 
on sets AgS"^\{0}. 

Before proving a lower bound on the expected routing type, we need to 
ensure that we are not adding an unbounded number of edges. 

Lemma 6.3. The expected number of shortcuts added to each vertex un- 
der augmentation with intensity (8) is bounded by a constant. 

Proof. First note that E[#shortcuts added to x] < 'E[N{x)]. Now, let 
R{x) = inf{|y — x\:y €V,y ^ x}. If R{x) = 5, then all points within distance 
6/2 of X are in C(x). Thus 

B[N(x)\R(x)=5]<-^ f — — ^ -dy 

<J- C -drJ^&l. 
log n 7^/2 r logn 
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Hence, and since 'E[N{x) \ R{x) = 6] is decreasing in 5, 

E[iV(x)] = j^^ E[N{x) I R{x) = {6) d5 

< B[N{x) I R{x) = l/n]P{R{x) > 1/n) 

logn 



<2 + !^^/""l„g(2/J).. 

logn Jo 



where S{6) is the area of a sphere of radius S, and c is a constant indepen- 
dent of n. □ 

The proof of the following theorem uses the by-now standard argument 
from [11]. 

Theorem 6.4. For every n sufficiently large, the shortcut graph cre- 
ated by augmenting the Poisson-Delaunay graph with intensity (8) has an 
expected greedy routing time of 0{log^ n) . 

Proof. Let the route currently be at the vertex x, such that |3; — z| = 
d > 1/n. Let B be the event that \ni{x) — z\ < d/A for some i. Then 



Vol (3ci/4) logn log?2 

By Lemma 6.2, if such a nj(x) exists, then x has a neighbor within distance 
d/2 of z, and greedy routing at least halves the distance to z in the next 
step. If B fails to occur, then we know by Lemma 6.1 that greedy routing 
can still progress to a point closer to the destination, and whether or not B 
occurs is independent of previous steps. Thus the expected number of steps 
until the distance to the target is halved is O(logn), which together with 
Lemma 2.2 proves the result. □ 



6.2. Balanced augmentation. In order to derive a result similar to The- 
orem 5.2 for the Delaunay setting, we will need to redefine the "balanced 
distribution" somewhat. In particular, we need to marginalize over the po- 
sitions of the Poisson points. 

Let the hitting measure of A C S''^\{0} be defined by 

K{A) = E[number of t s.t. Xz{t) eA\Z = z] 
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where Xz{t) is the greedy routing process as above, and the existence of 
a point at z is included in the conditioning. Note that, by the translation 
invariance of the construction, hz{A) = ho^A — z). 
We call a distribution Poisson-balanced if 

(9) i{A) = 

r 

where r = E[length of a greedy walk] = hz{S'^\{0}). 

Lemma 6.5. There exists a Poisson-balanced distribution. 

Proof. The proof is similar to the discrete case. A given shortcut mea- 
sure i' induces a hitting measure /iq(A), which in turn gives rise to a mea- 
sure i" via (9). If we let L be the space of measures of total probability 1 
on S"^\{0} equipped with the total variation metric 

dTY{fJ',i^)= sup \n{A)-u{A)\, 
Ael3{S'i{0}) 

then the mapping i' i" is a mapping from L to itself. L is convex and 
compact, so it suffices to show that the mapping is continuous for us to 
apply Brouwer's fix-point theorem. 

Since we know that r > 1, the second step of the mapping is certainly 
continuous. The first is also, since the hitting probability depends only on 
a finite number of random variables with distribution depending on i' . For- 
mally: 

Take e > and any m = n'^. Let ii and £2 be two shortcut measures. 
Without loss of generality, we assume that £2 > ^i, and we let dTv(^ij ^2) < e' 
where 



3mmax((e — l)n,log(3m/e)) 

We couple the routing processes Xq and Xq by letting them use the same 
set of Poisson process distributed vertices V, and the same starting point 
z. At each x £V, we construct shortcuts ni{x) according to £1 which both 
processes may use, and then add an additional set of shortcuts {nf{x)} 
according to £2 — £1, which only Xq may use. 

It follows that for any x, the cardinality of {7ij{x)}, N{x), is dominated 
by a Poi(e') random variable, so 

P{N{x) = 0) < 1 - e""' < e. 

Let B be the event that a given vertex x in y has N{x) > 0. Then 

P{B) < P{B I \V\ < (e - l)m + q) + P{\V\ < (e - l)m + q) 

= {{e - l)m + q)e' + e"'' 
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where the last inequahty follows from setting q = log(3m/e). 

Now let Hi{A) and H2{A) be the number of points reached in a subset 
A C 5'^\{0} by the respective processes. If the /ij and are the respective 
hitting probabilities, then 

\hl{A)-hliA)\=E[\H,iA)-H2iA)\] 

= B[\V\]I'{Hi{A)j^H2{A))<e 

since Hi{A) = H2{A) if no vertex has different shortcuts in the two cases. 
This completes the proof. □ 

In order to bound the routing time in this case, we will need the following 
geometrical fact. 

Lemma 6.6. There exists q G (0,1) such that, if x and y are points in 
a S'^, satisfying (3/4)(5 < \x — y\ < S, and {3/4)6 < r < 6, then the portion 
of the sphere Sr{y) which lies inside B(^^/g^^{x) is at least q. The constant q 
depends on d hut not on 5 and r. 

This follows directly from the fact that the statement is independent of 
scale. In one dimension q = 1/2 trivially, and in two it can easily be seen 
that it is at least 1/8; see Figure 3. 

Theorem 6.7. For every sufficiently large k and n= (|)'^, the shortcut 
graph created by augmenting the Poisson-Delaunay graph with a Poisson- 
balanced distribution has an expected greedy routing time r < ^ . 

Proof. We let Xq^ be the routing process for zero, and define ho on 
S"^\{0} as above. We then divide S"^\{0} into k phases of the form Fj = {x € 




Fig. 3. The circle around y intersects the ball around x in at least 1/8 of its points. 
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S'^ : ri-i < \x\ < Tj}, where = and each subsequent is defined so that 

hoiFi) = ^. 

For any phase Fi, assume that rj-i > |ri. Let x be a vertex in F^. A portion 
q of the area of each spherical "level" in x + Fj lies in Lj = i?o((3/8)rj) by 
Lemma 6.6. By rotational invariance it follows that ix{Li) = q£x{x + F^) = 
qi{Fi), so if B is the event x has a shortcut destination nj(x) closer than 
rj_i/2, then 

P{B) = 4(So((3/8)r,,)) > = |. 

By Lemma 6.2, if such a ni(x) exists, then x has a neighbor within distance 
d/2 of z, and greedy routing at least halves the distance to z in the next 
step. If B fails to occur, then we know by Lemma 6.1 that greedy routing 
will progress to a vertex closer to the target. The event B is independent of 
previous steps. Thus /io(-^i) < ^, whence t < 
If, on the other hand, ri_i < |rj for all i, then 

gfc-l 4 

ri < — = — . 

4 3n 

Let be the number of vertices in Fi. By Observation 2.2 

MFi)<E[iv] = Mri)=,. 

It follows that T <ck, so the result holds when k > cq. □ 

7. The rewiring algorithm revisited. Proposition 3.3 shows that, under 
the stationary distribution of the destination sampling algorithm introduced 
above, the marginal shortcut distribution at each point is balanced, and it 
is thus tempting to apply Theorem 5.2 to bound the greedy path-length. 
However, that theorem assumed that the shortcuts had been chosen inde- 
pendently at each vertex, which is not the case under Algorithm 3.1 which 
originally motivated the work. Showing that these dependencies do not neg- 
atively affect routing is an open problem, which we discuss in general terms 
in this section. 

There are two sources of dependencies between the shortcuts of neigh- 
boring vertices. First, there is a chance that they sampled the destination 
of the same walk. When p is large, this dependency is substantial, and we 
see a highly detrimental effect even in the simulations. By using a small p, 
however, this dependence is muted. Another, more subtle dependence has to 
do with the way the shortcuts of vertices around a vertex x may affect the 
destinations of the walks x sees. In the directed cycle, if x + 1 has a shortcut 
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to X — 10, that will make it less likely for x to see walks for places "beyond" 
X — 10, since many such walks will have followed the shortcut at x + 1, and 
thus skipped over x. 

The first source of dependence, that of sampling from the same walk, can 
be handled by modifying the algorithm to make sure we do not sample more 
than once for each walk. Take p <l/n and, once a walk is completed, choose 
to update exactly one of its links with probability pw where w is the length 
of the walk. Which link to update is then chosen uniformly from the walk. 
This way, the probability that a vertex updates its shortcut when hit by a 
walk is still always p, but we never sample two shortcuts from the same walk. 
The modified algorithm is less natural, but clearly a good approximation of 
the original for small values of p. Although it is more complicated, it is easier 
to analyze, since it allows for the simplifying assumption that each edge is 
chosen from a different greedy walk. 

The other dependencies are more complicated, and there is no easy way 
to modify the algorithm to remove them. However, it is worth noting that 
it is hard to see why these dependencies (unlike the first type) would be 
destructive for greedy routing. In fact it makes sense that, if x in our example 
gets few walks destined beyond x — 10 because of the shortcut present at 
X + 1, then it should also choose a shortcut to beyond x — 10 with a smaller 
probability. 

In the proof of Theorem 5.2 we use independence only to show that if 
the probability of having a shortcut out of a phase at the very furthest 
point is p, then the expected number of steps in the phase is bounded by 
1/p. There is little reason to believe this would not hold under the modified 
algorithm, since if the link from the furthest point does not take us out of 
the phase, then it either goes to a point within the phase, or overshoots the 
destination. If it goes to a point within the phase, then we follow it, and 
the presence of that shortcut should not interfere with those we see in the 
future. If, on the other hand, it overshoots, then by the argument above it 
should make it more likely that the succeeding ones do not do so, giving us 
a better probability of leaving the phase than in the independent case. 

Formalizing the requirements on the dependence, and proving that our 
stationary distribution indeed has the necessary properties, is the main open 
problem which we have yet to resolve. 

7.1. Computer simulation. Simulations indicate that the algorithm gives 
results which scale as desired in the number of greedy steps, and that the 
distribution approximates 1/ {hn,dd{x,y)) for the one-dimensional grid. 

The results in the directed one-dimensional grid can be seen in Figure 4. 
To get these results, the graph is started with no shortcuts, and then the 
algorithm is run lOn times to initialize the references. The value p = 0.1 is 
used. The greedy distance is then measured as the average of 100,000 walks, 



NEIGHBOR SELECTION IN SMALL WORLDS 



21 



Sqrt. Mean Path-length 
10 r r 



algorithm 
hai'iuonic 



9 - 



5 - 



6 - 



8 - 



7 - 




4 







2 



4 6 
log2 of AT/IOOO 



8 



10 



Fig. 4. The expected greedy walk time of the selection algorithm, compared to selection 
according to harmonic distances, in a cycle. 

each updating the graph according to the algorithm. The effect of running 
the algorithm, rather than freezing one configuration, seems to be to lower 
the variance of the observed value. 

The square root of the mean greedy distance increases linearly as the 
graph size increases exponentially, just as we would expect. In fact, as can 
be seen, our algorithm leads to better simulation results than choosing from 
Kleinberg's distribution. Doubling the graph size is found to increase the 
square root of the greedy distance by about 0.41 when links are selected 
using our algorithm, compared to an increase of about 0.51 when Kleinberg's 
model is used. [In fact, in Kleinberg's model we can use (5) to calculate 
numerically exact values for r, allowing us to confirm this figure.] 

In Figure 6 the marginal distribution of shortcut lengths is plotted. It is 
roughly harmonic in shape, except that destination sampling creates fewer 
links of length close to the size of the graph. This may be part of the reason 
why it is able to outperform Kleinberg's model: while the latter is asymp- 
totically correct, our algorithm takes into account finite size effects. (This 
reasoning is similar to that of the authors of [4]. Like them, we have no 
strong analytic arguments for why this should be the case, which makes it 
a tenuous argument at best.) 

The algorithm has also been simulated to good effect using base graphs 
of higher dimensions. Figure 5 shows the mean greedy distance for two- 
dimensional grids of increasing size. Here also, the algorithm creates config- 
urations that seem to display square logarithmic growth, and which perform 
considerably better than explicit selection according to Kleinberg's model. 
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Fig. 5. T/ie expected greedy walk time of the selection algorithm, compared to selection 
according to harmonic distances, in a two-dimensional base grid. 



8. Conclusion. The study of navigable graphs is still in its infancy, but 
many interesting results have already been found, and the practical relevance 
to such fields as computer networks is beyond doubt. In this paper we have 
presented a different way of looking at the dynamics that cause graphs to 
be navigable, and we have presented an algorithm which may explain how 
navigable graphs arise naturally. The algorithm's simplicity also means that 
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it can be useful in practice for generating graphs that can easily be searched, 
an important property for many structures on the Internet. 

While many questions about these graphs in general, and our results in 
particular, remain unanswered, the prospects for going further with this 
work seem good. We are hopeful that these ideas will be fruitful, leading to 
further analysis of searching and routing in graphs of all kinds. 

Acknowledgments. Thanks to my advisers Olle Hggstrom and Devdatt 
Dubhashi, as well as Ian Clarke who originally suggested the edge updating 
algorithm, and Jon Kleinberg for taking the time to listen to and reflect 
upon my ideas. 

REFERENCES 

[1] Barriere, L., Fraigniaud, p., Kranakis, E. and Krizanc, D. (2001). Efficient 
routing in networks with long range contacts. In Proceedings of the 15th In- 
ternational Symposium on Distributed Computing DISC 01 270-284. Springer, 
Berlin. 

[2] Clarke, I., Hong, T., Miller, S., Sandberg, O. and Wiley, B. (2002). Protect- 
ing free expression online with Freenet. IEEE Internet Computing 6 40-49. 

[3] Clarke, I., Hong, T., Sandberg, O. and Wiley, B. (2000). Freenet: A dis- 
tributed anonymous information storage and retrieval system. In Proceedings 
of the ICSI Workshop on Design Issues in Anonymity and Unobservabihty 311- 
320. Springer, Berlin. 

[4] Clauset, a. and Moore, C. (2003). How do networks become navigable? Preprint. 

[5] Draief, M. and Ganesh, A. (2006). Efficient routing in Poisson small-world net- 
works. J. Appl. Probab. 43 678-686. MR2274792 

[6] Duchon, p., Hanusse, N., Lebhar, E. and Schabanel, N. (2006). Could 
any graph be turned into a small-world? Theoret. Comput. Sci. 355 96-103. 
MR2212010 

[7] Duchon, P., Hanusse, N., Lebhar, E. and Schabanel, N. (2006). Towards smaU 
world emergence. In Proceedings of 18th ACM Symposium on Parallelism in 
Algorithms and Architectures 225-232. ACM, New York. 
[8] Eppstein, D. and Wang, J. Y. (2002). A steady state model for graph power laws. 

In 2nd International Workshop on Web Dynamics. 
[9] Franceschetti, M. and Meester, R. (2006). Navigation in small-world networks: 
A scale-free continuum model. J. Appl Probab. 43 1173-1180. MR2274645 
[10] Kleinberg, J. (2000). Navigation in a small world. Nature 406 845. 
[11] Kleinberg, J. (2000). The small-world phenomenon: An algorithmic perspective. 

In Proceedings of the 32nd Annual ACM Symposium on Theory of Computing 
(STOC) 163-170. ACM, New York. MR2114529 
[12] Kleinberg, J. (2001). Small-world phenomena and the dynamics of information. In 
Advances in Neural Information Processing Systems (NIPS) 14 431-438. MIT, 
Cambridge. 

[13] Kleinberg, J. (2006). Complex networks and decentralized search algorithms. In 
Proceedings of the International Congress of Mathematicians {ICM) III 1019- 
1044. Eur. Math. Soc, Ziirich. MR2275717 



24 



O. SANDBERG 



[14] Kumar, R., Liden-Nowell, D., Novak, J., Raghavan, P. and Tomkins, A. 

(2005). Theoretical analysis of geographic routing in social networks. Techni- 
cal Report MlT-CSAIL-TR-2005-040. 

[15] LiBEN-NOWELL, D., NOVAK, J., KUMAR, R., RAGHAVAN, P. and TOMKINS, A. 

(2005). Geograph routing in social networks. Proc. Natl. Acad. Sci. USA 102 
11623-11628. 

[16] Singh Manku, G. (2004). Know thy neighbor's neighbor: The power of lookahead 
in randomized P2P networks. In Proceedings of the 36th ACM Symposium on 
Theory of Computing [STOC) 54-63. ACM, New York. 

[17] Martel, C. and Nguyen, V. (2004). Analyzing Kleinberg's (and other) small- world 
models. In PODC 04'. Proceedings of the Twenty-Third Annual ACM Sympo- 
sium on the Principles of Distributed Computing 179-188. ACM, New York. 

[18] MiLGRAM, S. (1961). The small world problem. Psychology Today 1 61. 

[19] Newman, M. (2000). Models of the smaU world: A review. J. Statist. Phys. 101 
819-841. 

[20] Newman, M. and Watts, D. (1999). Renormahzation group analysis of the small- 
world network model. Phys. Lett. A 263 341-346. MR1732095 

[21] Watts, D. J., Dodds, P. and Newman, M. (2002). Identity and search in social 
networks. Science 296 1302-1305. 

[22] Watts, D. J. and Strogatz, S. (1998). Collective dynamics of small world networks. 
Nature 393 440. 

[23] Zhang, H., Goel, A. and Govindan, R. (2004). Using the small-world model to 
improve Freenet performance. Comput. Networks 46 555-574. 

Division of Mathematical Statistics 
Department of Mathematical Sciences 
Chalmers and Goteborg University 
412 96 Goteborg 
Sweden 

E-mail: ossa@math.chalmers.se 



